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Abstract — Dual descent methods are used to solve network 
optimization problems because descent directions can be com- 
puted in a distributed manner using information available 
either locally or at neighboring nodes. However, choosing a 
stepsize in the descent direction remains a challenge because its 
computation requires global information. This work presents 
an algorithm based on a local version of the Armijo rule that 
allows for the computation of a stepsize using only local and 
neighborhood information. We show that when our distributed 
line search algorithm is applied with a descent direction com- 
puted according to the Accelerated Dual Descent method |18|, 
key properties of standard backtracking line search using the 
Armijo rule are recovered. We use simulations to demonstrate 
that our algorithm is a practical substitute for its centralized 
counterpart. 

I. Introduction 

Conventional approaches to distributed network optimiza- 
tion are based on iterative descent in either the primal or dual 
domain. The reason for this is that for many types of network 
optimization problems there exist descent directions that can 
be computed in a distributed fashion. Subgradient descent 
algorithms, for example, implement iterations through dis- 
tributed updates based on local information exchanges with 
neighboring nodes; see e.g., Q, ifTOl . lfT2l . ifTTl . However, 
practical applicability of the resulting algorithms is limited 
by exceedingly slow convergence rates typical of gradient 
descent algorithms. Furthermore, since traditional hne search 
methods require global information, fixed stepsizes are used, 
exacerbating the already slow convergence rate, lfT4l . ifTSl . 

Faster distributed descent algorithms have been recently 
developed by constructing approximations to the Newton 
direction using iterative local information exchanges, IT], 
in, ifTSl . These results build on earlier work in ||2| and 
fTTl which present Newton-type algorithms for network flow 
problems that, different from the more recent versions in [9] 
and ifTSll . require access to all network variables. To achieve 
global convergence and recover quadratic rates of centralized 
Newton's algorithm |9| and |18| use distributed backtracking 
line searches that use average consensus to verify global 
exit conditions. Since each backtracking line search step 
requires running a consensus iteration with consequently 
asymptotic convergence ||6l, IS], the exit conditions of the 
backtracking line search can only be achieved up to some 
error. Besides introducing inaccuracies, computing stepsizes 
with a consensus iteration is not a suitable solution because 
the consensus iteration itself is slow. Thus, the quadratic 
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convergence rate of the algorithms in f9l and fTSl is to some 
extent hampered by the linear convergence rate of the line 
search. This paper presents a distributed line search algorithm 
based on local information so that each node in the network 
can solve its own backtracking line search using only locally 
available information. 

Work on line search methods for descent algorithms can 
be found in Li6J, L19J, |8|. The focus in [16] and L19J is on 
nonmonotone hne searches which improve convergent rates 
for Newton and Newton-like descent algorithms. The objec- 
tive in [8J is to avoid local optimal solutions in nonconvex 
problems. While these works provide insights for developing 
line searches they do not tackle the problem of dependence 
on information that is distributed through nodes of a graph. 

To simplify discussion we restrict attention to the network 
flow problem. Network connectivity is modeled as a directed 
graph and the goal of the network is to support a single 
information flow specified by incoming rates at an arbitrary 
number of sources and outgoing rates at an arbitrary number 
of sinks. Each edge of the network is associated with a 
concave function that determines the cost of traversing that 
edge as a function of flow units transmitted across the link. 
Our objective is to find the optimal flows over all links. 
Optimal flows can be found by solving a concave optimiza- 
tion problem with linear equality constraints (Section II). 
Evaluating a line search algorithm requires us to choose 
a descent direction. We choose to work with the family 
of Accelerated Dual Descent (ADD) methods introduced in 
ifTSl . Algorithms in this family are parameterized by the 
information dependence between nodes. The A^th member 
of the family, shorthanded as ADD-N, relies on information 
exchanges with nodes not more than N hops away. Similarly, 
we propose a group of line searches that can be implemented 
through information exchanges with nodes in this N hop 
neighborhood. 

Our work is based on the Armijo rule which is the 
workhorse condition used in backtracking line searches, lfT3l 
Section 7.5]. We construct a local version of the Armijo 
rule at each node by taking only the terms computable at 
that node, using information from no more than N hops 
away(Section III). Thus the line search always has the same 
information requirements as the descent direction computed 
via the ADD-N algorithm. Our proofs(Section IV) leverage 
the information dependence properties of the algorithm to 
show that key properties of the backtracking line search are 
preserved: (i) We guarantee the selection of unit stepsize 
within a neighborhood of the optimal value (Section IV- 
A). (ii) Away from this neighborhood, we guarantee a strict 
decrease in the optimization objective (Section IV-B). These 
properties make our algorithm a practical distributed alterna- 



tive to standard backtracking line search techniques. Simula- 
tions further demonstrate that our line search is functionally 
equivalent to its centralized counterpart (Section V). 

II. Network Optimization 

Consider a network represented by a directed graph Q ~ 
(A/", £) with node set A/" = {1, . . . , n}, and edge set £ ~ 
{1,...,E}. The zth component of vector x is denoted as 
x\ The notation a; > means that all components x* > 
0. The network is deployed to support a single information 
flow specified by incoming rates 6* > at source nodes and 
outgoing rates &' < at sink nodes. Rate requirements are 
collected in a vector b, which to ensure problem feasibility 
has to satisfy J27=i ^* ~ 1- O™ 8°^^ i^ '■° determine a flow 
vector X = [x'^]ee£, with x'^ denoting the amount of flow on 
edge e = {i,j). 

Flow conservation implies that it must be Ax = b, with A 
the n X E node-edge incidence matrix defined as 



[Ah = 



1 







if edge j leaves node i, 
if edge j enters node i, 
otherwise. 



where [A]ij denotes the element in the ith row and jth 
column of the matrix A. We define the reward as the negative 
of scalar cost function (t>e{x'^) denoting the cost of x'^ units 
of flow traversing edge e. We assume that the cost functions 
(^e are strictly convex and twice continuously differentiable. 
The maximum reward network optimization problem is then 
defined as 



maximize —f{x) = 
subject to: Ax ~ b. 






-Mx') 



(1) 



Our goal is to investigate a distributed line search technique 
for use with Accelerated Dual Descent (ADD) methods 
for solving the optimization problem in (fill. We begin by 
discussing the Lagrange dual problem of the formulation 
in (fill in Section II-Ai and reviewing the ADD method in 
Section HTbI 

A. Dual Formulation 

Dual descent algorithms solve (fill by descending on the 
Lagrange dual function q{X). To construct the dual function 
consider the Lagrangian C{x, I) = — J2e=i (t>e{x'') + l'{Ax — 
b) and define 



9(0 



sup C{x, I) 



sup 

a;GR^ 



I' Ax 



I'b 



E 



^ \ e=l / 

mp{{l'AYx'-c^,{x''))-l'h, (2) 

■SCO V / 



sup 



where in the last equality we wrote V Ax = X]e=i(^'^)^^'^ 
and exchanged the order of the sum and supremum operators. 
It can be seen from (|2| that the evaluation of the dual 
function q{l) decomposes into the E one-dimensional opti- 
mization problems that appear in the sum. We assume that 



each of these problems has an optimal solution, which is 
unique because of the strict convexity of the functions <j)e- 
Denote this unique solution as x'^{l) and use the first order 
optimality conditions for these problems in order to write 



x'^{i)^m-\i'-in, 



(3) 



where i ^ J\f and j ^ Af respectively denote the source and 
destination nodes of edge e — (z, j). As per (J3]l the evaluation 
of x'^{l) for each node e is based on local information about 
the edge cost function 0*^ and the dual variables of the 
incident nodes i and j. 

The dual problem of (fill is defined as min/gR^ q{l). 
The dual function is convex, because all dual functions of 
minimization problems are, and differentiable, because the 
ipe functions are strictly convex. Therefore, the dual problem 
can be solved using any descent algorithm of the form 



^fc+i = ^fe + ctkdk for all fc > 0, 



(4) 



where the descent direction dk satisfies g^dfc < for all 
times k with g^ = gHk) — ^q{lk) denoting the gradient of 
the dual function q{l) at I = Ik- An important observation 
here is that we can compute the elements of gk as 



e=(''j) e={j,i) 



)-h 



(5) 



with the vector x{lk) having components x'^{lk) as deter- 
mined by (f3]l with / = Ife, Ej Section 6.4]. An important 
fact that follows from (fSj is that the ith element g\ of 
the gradient g^ can be computed using information that 
is either locally available a;'*'-'^ or available at neighbors 
2;(j/0 Thus, the simplest distributed dual descent algorithm, 
known as subgradient descent takes dk = —gk- Subgradient 
descent suffers from slow convergence so we work with an 
approximate Newton direction. 

B. Accelerated Dual Descent 

The Accelerated Dual Descent (ADD) method is a param- 
eterized family of dual descent algorithms developed in 1 18|. 
An algorithm in the ADD family is called ADD-N and each 
node uses information from iV-hop neighbors to compute 
its portion of an approximate Newton direction. Two nodes 
are iV-hop neighbors if the shortest undirected path between 
those nodes is less than or equal to N - 

The exact Newton direction dk is defined as the solution 
of the linear equation Hkdk = —gk where Hk = H{lk) = 
V^g(/fe) denotes the Hessian of the dual function. We ap- 
proximate dk using the ADD-N direction defined as 



4"^- 



-H, 



(N) 



9k 



(6) 



where the approximate Hessian inverse. Hi ' is defined 



(JV) 



nr 



N 



-H^k 



r=0 



Dk 



BkD,. 



D, 



(7) 



using a Hessian splitting: Hk = Dk — Bk where Dk is the 
diagonal matrix [Dk]ii — [Hk]ii- The resulting accelerated 



dual descent algorithm 



'fe+i ^ lk + "ferffe 



(N) 



for all /c > 0, 



(8) 



can be computed using information from A^-hop neighbors 
because the dependence structure of g^ shown in equation 
(|5]l causes the Hessian to have a local structure as well: 
[Hk]ij 7^ if and only if (i,j) G 8. since Hk has the 
sparsity pattern of the network, B^ and thus D^, "^ B^Dj, ^ 
must also have the sparsity pattern of the graph. Each term 



^r( 



d,jb.d: 



Df, ^ is a matrix which is non-zero 
only for r-hop neighbors so the sum is non-zero only for 
A'^-hop neighbors. 

Analysis of the ADD-N algorithm fundamentally depends 
on a network connectivity coefficient p, which is defined in 
ifTSl as the bound 



p{BkD^')<pe {0,1) 



(9) 



where p{-) denotes the second largest eigenvalue modulus. 
When p is small, information in the network spreads effi- 
ciently and df. ' is a more exact approximation of dk- See 
ifTSl for details. 



III. Distributed Backtracking Line Search 

Algorithms ADD-A^ for different A^ differ in their in- 
formation dependence. Our goal is to develop a family 
of distributed backtracking line searches parameterized by 
the same A^ and having the same information dependence. 
The idea is that the A^*'' member of the family of line 
searches is used in conjunction with the A^*'' member of 
the ADD family to determine the step and descent direction 
in (IHll. As with the ADD-A^ algorithm, implementing the 
distributed backtracking line search requires each node to 
get information from its A^-hop neighbors. 

Centralized backtracking line searches are typically in- 
tended as method to find a stepsize a that satisfies Armijo's 
rule. This rule requires the stepsize a to satisfy the inequality 



q{\ + ad) < q{X) + aad'g, 



(10) 



for given descent direction d and search parameter a £ 
(0, 1/2). The backtracking line search algorithm is then 
defined as follows: 

Algorithm 1. Consider the objective function q{-) and given 
variable value A and corresponding descent direction d and 
dual gradient g. The backtracking line search algorithm is: 

Initialize a = 1 

while q{\ + ad) > q{X) + aad'g 
a = a/3 

end 
The scalars j3 £ (0, 1) and a G (0, 1/2) are given parame- 
ters. 



In order to create a distributed version of the backtracking 
line search we need a local version of the Armijo Rule. We 
start by decomposing the dual objective q{\) — X]r=i liW 
where the local objectives takes the form 



9i(A) = ^ (jyei^x"^)- \{a[x-bi). 



(11) 



The vector a'^ is the i*'^ row of the incidence matrix A. Thus 
the local objective qi{\) depends only on the flows adjacent 
to node i and A'. 

An A^-parameterized local Armijo rule is therefore given 

by 

qi{X + aid) < qi{X) + aa^ ^ d^ g^ , (12) 

where M^ is the set of N-hop neighbors of node j. The 



scalar a G (0, 1/2) is the same as in (lOi, g = V(7(A) and 
d is a descent direction. Each node is able to compute a 
stepsize ai satisfying ( [T2| using A^-hop information. The 
stepsize used for the dual descent update (pi is 



a = mm a, 

teAf 



(13) 



Therefore, we define the distributed backtracking line search 
according to the following algorithm. 

Algorithm 2. Given local objectives qi{-), descent direction 
d and dual gradient g. 

for i — 1 : n 

Initialize ai — 1 

while q,{X + a,d) > q,{\) + aui EjgAA<"' '^^9^ 

ai = aiP 
end 
end 

a = mini «« 
The scalars (3 G (0, 1), <t G (0, 1/2 - p^+^/2) and N G Z+ 
are parameters. 

The distributed backtracking line search described in Al- 
gorithm l2] works by allowing each node to execute its own 
modified version of Algorithm [T] using only information from 
A^-hop neighbors. Minimum consensus of ai requires at most 
diameter of Q iterations. If each node shares its current 
ai along with g^ with its A^-hop neighbors the maximum 
number of iterations drops to \ diam{Q) / N~\ . 

The parameter a is restricted by the network connectivity 
coefficient p and the choice of A^ because these are scalars 
which encode information availability. Smaller p^+^ indi- 
cates more accessible information and thus allows for greater 
a and thus a more aggressive search. As p^"*"^ approaches 
zero, we recover the condition a € (0, 1) from Algorithm [T] 

IV. Analysis 



This line search algorithm is commonly used with New- In this section we show that when implemented with 

ton's method because it guarantees a strict decrease in the the Accelerated Dual Descent update in ^ the distributed 

objective and once in an error neighborhood it always selects backtracking line search defined in Algorithm [2] recovers 

a = 1 allowing for quadratic convergence, 1,4, Section 9.5]. the key properties of Algorithm [Tl strict decrease of the 



dual objective and selection of a = 1 within an error 
neighborhood. 

We proceed by outlining our assumptions. The standard 
Lipshitz and strict convexity assumptions regarding the dual 
Hessian are defined here. 

Assumption 1. The Hessian H{1) of the dual function q{l) 
satisfies the following conditions 

(Lipschitz dual Hessian) There exists some constant L > 
such that 

\\H{1) - H{1)\\ < L\\l ~ l\\ yije M". 

(Strictly convex dual function) There exists some constant 
M >0 such that \\H{l)-^\\ < M yi e M". 

In addition to assumptions about the dual Hessian we 
assume that key properties of the inverse Hessian carry 
forward to our approximation. 

Assumption 2. The approximate inverse Hessian remains 
well conditioned, 

m< |li?(^)|l <M. 

within the subspace 1^. 

These assumptions make sense because H^^^ is a trun- 
cated sum whose limit as N approaches infinity is H^^, 
a matrix we already assume to be well conditioned on 1-'- 
even when solving this problem in the centralized case. 
Furthermore the first term in the sum is D^^ which is well 
conditioned by construction. 

We begin our analysis by characterizing the stepsize a 
chosen by Algorithm l2] when the descent direction d is chosen 
according the the ADD-N method. 

Lemma 1. For any a^ satisfying the distributed Armijo rule 
in equation ( 12 ) with descent direction d = ^H^^'g we have 



qi{X + a^d) - q,{X) < 0. 

Proof: Recall that H^^") is non-zero only for elements 
corresponding to A^-hop neighbors by construction. There- 
fore, by defining the local gradient vector y^*^ as a sparse 
vector with nonzero elements [g^^^j — g-' for j e A/", we 
can write 



E 

(N) 



dV = -(.9^*M i/Wy^') 



(14) 



J6A^- 



Because H^^'> is positive definite the right hand side of ( 14 1 
is nonpositive from where it follows that ^ pj\r(«) '^"'5"' ^ 0. 
The desired result follows by noting that a^ and a are positive 
scalars. ■ 

Lemma [T] tells us that when using the distributed back- 
tracking line search with the ADD-N algorithm, we achieve 
improvement in each element of the decomposed objective 
qi{X)- From the quadratic form in equation (14 1 it also 



follows that if equation ( [T2| is satisfied by a stepsize a^, 
then it is also satisfied by any a < ai and in particular 
a = mini Q^i satisfies equation ( 12 1 for all i. 



A. Unit Stepsize Phase 

A fundamental property of the backtracking line search us- 
ing Armijo's rule summarized in Algorithm [T}s that it always 
selects a = 1 when iterates A are within a neighborhood of 
the optimal argument. This property is necessary to ensure 
quadratic convergence of Newton's method and is therefore a 
desirable property for the distributed line search summarized 
in Algorithm |2] We prove here that this is true as stated in 
the following theorem. 

Theorem 1. Consider the distributed line search in Algo- 
rithm^with parameter N, starting point X — Xk, and descent 
direction d — d^, = ^^k 9k computed by the ADD-N 
algortihm [cf ^ and (|7]). If the search parameter a is chosen 
such that 

fT e 0, ^ 



and the norm of the dual gradient satisfies 
3to 



\9k\ 



< 



(1 



P 



■N+l 



then Algorithm^selects stepsize a = 1. 



2cr] 



;;(») 



Proof: Recall the definition of the local gradient gj^ 
as the sparse vector with nonzero elements [gj!^ ]j = gl. for 
j € J\fl '. Further define the local update vector dj. := 
Hj^ 'g^' whose sparsity pattern is the same as that of g^^^' . 
Due to this and to the fact that the local objective qi{X) in 
( [TT| ) depends only on values in A/^ , we have 



ti)\ 



q^iXk + adk) = gj(Afc + adl'). 



(15) 



Applying the Lipschitz dual Hessian assumption to the local 
update vector d^, we get 



\H{Xk+ad^^'>) - H{Xk)\\ < aL\\d\:'>\ 



(16) 



We further define a reduced Hessian \/^qi{X) — H^^'^ by 
setting to zero the rows and columns corresponding to nodes 
outside of the neighborhood M^ , i.e.. 



fji^ 



u 



else 



AAi^) 



(17) 



Since the elements of H already satisfy Hij — for all 
i,j ^ £ the resulting i/*^*^ has the structure of a principal 
submatrix of H with the deleted rows left as zeros. Since 
the norm ||-ff (A*; + adj^ ) — _ff(AA;)|j in ([T6| is the maximum 
eigenvalue modulus of the matrix H{Xk+oid^ )—H{Xk), it is 
larger than the norm ||_ff(*)(Afc + adj. )— _ff(')(Afc)|| because 
the latter is the maximum over a subset of the eigenvalues 
of the former Combining this observation with ([T6| yields 



\H^^{Xk+adf) - if«(Afe)|| < ai||dTl|. 



(18) 



Interpret now the update in ( fTSJ ) as a function of qi [a) defined 

as 



5j(a) := qi{Xk + a(¥^) 



(19) 



Differentiating with respect to a and using the definition of 
the local gradient g^ we get the derivative of qi{a) as 

q[{a) = Vg,(A, + adf)df = .g(^)(Afc + adf)df. (20) 

Differentiating with respect to a a second time and using the 
definition of _ff (*) in ( fT?] ) yields 

«,"(«)= d^'V\{\u + ad!{>)d^ 



Substitute now 



L, 



into d24ll and the result into d23|l to write 



■{i) 



*(i)-^.(o)< i\^H\r'9i:'r+\d^'Hd, 



rW~W||3 



6' 



<^ E 5X + (i--) E 5K. 



jeAA^ 



jeAA^ 



Using the expression for the quadratic form in (24i to 
substitute the last term in the previous equation yields 



= d'^>'H^^^{\u + ad'^>)d'^> 



(21) 



(») 



(26) 



Return now to (18 1 and replace the matrix norm on the right 



hand side with left and right multiplication by the unit vector 

4*Vll4-^ll- This yields 



7(0' 



m\\k + adf)-H^^{\k) 



~M . 



d):><aL\\d, 



'ii)u3 



(22) 



Comparing the expressions for the derivatives g"(a) in pTj i 
with the left hand side of ( |22] l we can simplify the latter to 

Integrating the above expression with respect to a results in 

gKa)-g1(0)<yL||d?||^ + agt(0), 
which upon a second integration with respect to a yields 



Further note that from the definition of d^"^^ it follows that 

The right hand side of this latter equality can be bounded 
using Cauchy-Schwarz's inequality and the submultiplicity 
of matrix norms as 



\9^\ 



\H, 



W| 



-Wfe-Hfe II ||.9fe I 



q^ia)^q,iO)<—L\\dl 



-q'/{0) + aql{0). 



9lP'HrHHr9^ < 

The norm HkHf, ' can be further bounded using the result 
\\Hr,H[^\\ < p^+i + 1 from ^. The norm ||ij^^^l can be 



Since we are interested in unit stepsize substitute a — 1 and 
the definitions of the derivatives q,'(0) and q'/{0) given in 
( pOJ l and (|2T) to get 



bounded as \\Hj. \\ < M according to Assumption|2| These 
two observations substituted in the last displayed equation 
yield 

d^'Hud^ < M{p^+' + l)\\~gl:^ Ip. (27) 

Applying the bound \\H, '\\ < M from Assumption 2 to the 



r(^)sW|l3 



W;;(i)|l3 



f3||;;(i)||3 



norm 11-9^5^ II we get lli^r fffell < ^'llffrf- Since 
Assumption 2 also guarantees that \\Hi \\ > m, we have 



g.(i) - qM < ^IMT||^+ 2d?'^«(Afe)rf? + ff«'d?. 



s^^'srsi:^ 



Since according to (17 i the reduced Hessian H^"^^ has the 
structure of a principal submatrix of the Hessian H and H >: 
it follows that :< H^^^ :< H and that as a consequence 

d^'H^(\^d^<d!i^'H,d}i^. 

Incorporating this latter relation and the definition of the local 
update d)^' = Hj, 'gj!' in the previous equation we obtain 



\\i^¥ 



> m. 



Therefore, we can write 



M^ . 



s^'u^^'srai:'- 



(28) 



L. 



(^);iW||3 



m)-UO)< Q^k '9k 



(23) 



Substituting the relations ( |27) and ( 28 1 in relation ( 26 1 and 
factoring we get 

%(l)-ft(0)< a J2 didi 



+ 2 y^k 9k ) ^k^k 9k - 9k ^k 9k ■ 
Consider now the last term in the right hand side and recall 



9k ^k 9k 



-(l-.) + ^n.(0 



a^ + l 



5ni + 



s(*) 



the sparsity pattern of the local gradient g^ to write 

5feflfe, (24) 



-9k ^k 9k 



E fl'^- 



jeAfy 



6m """ " 2 

Use ||5«|| < ||.9fe|| < 6m/iLAP)iil-p^+^)/2-a) to 
write 

%(i)-g.(0)<a J2 



9idi 



and further split the right hand side of ( |24] l to generate 
suitable structure 

E 9idi- E <^9ldi + il-a)gldi. (25) 



Mui:^r 



-(1-a) 



1-p 



N+l 



^N+1 



2 J 2 

Algebraic simplification of the bracketed portion yields 



-(l-CT) 



1-P 



■N+l 



t;N+1 



1 



= 0. (29) 



Thus we have 



9.(1) 



9.(0) < a J2 aidi- 



Substituting the definition of qi{\) in ([19]) into this equation 
we arrive at 



q^(Xk + df^ <<Z»(Afe) 



d{ 



I9i 



j^f^y 



which means that the exit condition ( [T2| ) in Algorithm [2] is 
met with a = 1. ■ 

Theoremfllguarantees that for an appropriately chosen line 
search parameter a the local backtracking line search will 
always choose a step size of a = 1 once the norm of the 
dual gradient becomes small. Furthermore, the condition on 
the line search parameter tells us that p and our choice of N 
fully capture the impact of distributing the line search. The 
distributed Armijo rule requires (l — p^+^ — 2cr) > while 
the standard Armijo rule requires (1 — 2a) > 0. It is clear 
that in the limit N ^ oo these conditions become the same 
with a rate controlled by p. 

B. Strict Decrease Phase 

A second fundamental property of the backtracking line 
search with the Armijo rule is that there is a strict decrease 
in the objective when iterates are outside of an arbitrary 
noninfinitesimal neighborhood of the optimal solution. This 
property is necessary to ensure global convergence of New- 
ton's algorithm as it ensures the quadratic convergence phase 
is eventually reached. Our goal here is to prove that this 
strict decrease can be also achieved using the distributed 
backtracking line search specified by Algorithm l2] 

Traditional analysis of the centralized backtracking line 
search of AlgorithmfTlleverages a lower bound on the stepsize 
a to prove strict decrease. We take the same approach here 
and begin by finding a global lower bound on the stepsize 
a < ai that holds for all nodes i. We do this in the following 
lemma. 

Lemma 2. Consider the distributed line search in Algorithm 
p] with parameter N, starting point A — Xk, and descent 
direction d = dj^ = —Hf, gk computed by the ADD-N 
algortihm [cf. ^ and (|7]). The stepsize 



a = 2(1 -a) 



M2 



satisfies the local Armijo rule in ( |12[ ), i.e.. 



9.(Afc+i) < qi{Xk) + era X! '^iai 



jeni 



for all network nodes i and all k. 



Proof: From the mean value theorem centered at Aj. we 
can write the dual function's value as 

g,(Afc + ad«) = g,(A,) + ag^^'d!^^ + ^d^^' H(^\z)d!^^ 



where the vector z — X^ + tad)^' for some t G (0, 1); see 
e.g.,!!! Section 9.1]. We use the relation ^ -ff(') ^ H and 
the bound |l-ff~^|| > m from Assumption to transform this 
equality into the bound 



,JW^ 



q^{Xk + adY)<q^{Xk)+agl u.^. 



^«'J« 



2^y^ 



«||2 



,.s(«)j(0 



Introduce now a splitting of the term agj. dj. to generate 
convenient structure 



(7j(Afc +a4 ) < 9j(A/c) 



"o-5fe' «fc' +a(l-cr)g^ u.^^ 



^),^ 



" 11^' 



«||2 



Further apply the definition of the local update vector d^. 
Hj. gj^ and use the well-conditioning of the approximate 
inverse Hessian Hj, ' as per Assumption 2 to claim that 



TO < \\H, 



(TV) 



< M and obtain 



m 



qi{Xk + adl') < qi{Xk) 



+ "'75fc dl 



a(l-aH|5i*'|p. 



2to 



ii^l^^ir. 



Factoring common terms in this latter equation yields 

qi{Xk +ad)!') < qi{Xk) 



aagl' dl> 



amWgl^^ W^ 



aAP 



(2to2 



Substitute a for a in this inequality. Observe that by doing 
so we have [—(1 — cr) + aAP / {2ni^)] = implying that the 
second term vanishes from this expression. Therefore 



J(.) 



;.(^)'J(') 



(ji(Afc -I- adl') < qi{Xk) + aagl d 

From the definitions of the local gradient 5^ as the sparse 

vector with nonzero elements [g 

Id) 



;;Wl 



H^ 



{N)~i 

9 k 



jj. for j € nl and 
the desired result 



the local update vector dl. 
follows. ■ 

We proceed with the second main result using Lemma [2] 
in the same manner that strict decrease is proven for the 
Newton method with the standard backtracking line search 
in |4 1 [Section 9.5]. 

Theorem 2. Consider the distributed line search in Algo- 
rithmnUvith parameter N, starting point X = A^,, and descent 
direction d = d], = —Hj^ g^ computed by the ADD-N 
algortihm [cf. ([6| and ^]. If the norm of the dual gradient 
is bounded away from zero as \\gk\\ > '7, the function value 
at Ik+i — Ik + ctkdk satisfies 

q{Xk+i) - q{Xk) < -(iaamNrf 

I.e., the dual function decreases by at least aamNrf 

Proof: According to Lemma [2] we have 



ft(Afc+i) -gi(Afc) < aagl 



-q^'d^^ 




I 



^^ UailralizE 
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^^ Distribute 








^^ Distribute 


(medium) 


^^ Cedtraiiz 




^^ Distribute 


(iatge) 



likJIlliJiil : 



Fig. 1. The distributed line search results in solution trajectories nearly 
equivalent to those of the centralized line search. Top: the Primal Objective 
follows a similar trajectory in both cases. Middle: Primal Feasibility is 
achieved asymptotically. Bottom: unit stepsize is achieved in roughly the 
same number of steps. 



Fig. 2. The distributed line search reaches unit stepsize in 2 to 3 iterations. 
Fifty simulations were done for each algorithm with N= 1 , N=2 and N=3 and 
for Networks with 25 nodes and 100 edges (small), 50 nodes and 200 edges 
(medium) and 100 nodes and 400 edges (large). 



because d is a lower bound on a^. Therefore, Algorithmic] 
exits with a G {f3a, a) and any a < a satisfies the exit 
condition in ( [T2] i therefore 

q^{Xk+l) - q^{Xk) < Paag';:'>'d^'\ 

Applying Assumption pi with the definition of d^*' we get 

qi{Xk+i) - 9i(Afe) < -l3a(Tni\\g}!^\f. 

Summing over all i, 



g(Afc+i) - q{Xk) < -Paam } \\gl'\ 






ll#' 



Using 



the definition of the 2-norm we can write 



2^1=1 1^ 



jen] 



) (gfc)^- Counting the ap- 



pearance of each (g^,)^ '^^i'™ ™ 'his sum we have that 
E^=iEjg„(«)(.9D^ = J2^=l\4^^\i9lf■ Notice however 
that since the network is connected it must be jn^ \ > N, 
from where it follows E,=i E,e„(") (91)' < NEtiidl)'. 
Substituting this expression into the above equation yields 



q{X 



k+l/ 



q{Xk)<-PaamNj2i9l? 



i=l 



Observe now that Yl^=ii9^)' — llfffclP ^^'^ substitute the 
lower bound r/ < ||gfe|| to obtain the desired relation. ■ 

Theorem l2] guarantees global convergence into any error 
neighborhood \\gk\\ < V around the optimal value because 
the dual objective is strictly decreasing by, at least, the 
noninfinitesimal quantity fiacnraNrf' while we remain out- 
side of this neighborhood. In particular, we are guaranteed 
to reach a point inside the neighborhood \\gk\\ 1^ J] = 
Sm/iLAP) (1 - p^+i - 2a) at which point Theorem [I] will 



be true and the ADD-N algorithm with the local line search 
becomes simply 

Xk+i = Xk — Hf. gk- 

This iteration is shown to have quadratic convergence prop- 
erties in ifTSl . 

V. Numerical results 

Numerical experiments demonstrate that the distributed 
version of the backtracking line search is functionally equiv- 
alent to the centralized backtracking line search when the 
descent direction is chosen by the ADD method. The simu- 
lations use networks generated by selecting edges are added 
uniformly at random but are restricted to connected networks. 
The primal objective function is given by (p'^ix) — e"^^ + 
e~'^^ where c captures the notion of edge capacity. For 
simplicity we let c = 1 for all edges. 

Figure [T] shows an example of a network optimization 
problem with 25 nodes and 100 edges being solved using 
ADD-1 with the centralized and distributed backtracking line 
searches. The top plot shows that the trajectory of primal ob- 
jective is not significantly affected by the choice line search. 
The middle plot shows that primal feasibility is approached 
asymptotically at the same rate for both algorithms. The 
bottom plot shows that a unit stepsize is achieved in roughly 
the same number of steps. 

In Figure [2] we look closer at the number of steps required 
to reach a unit stepsize. We compare the distributed back- 
tracking line search to its centralized counterpart on networks 
with 25 nodes and 100 edges, 50 nodes and 200 edges and 
100 nodes and 400 edges. For each network optimization 
problem generated we implemented distributed optimization 
using ADD-1, ADD-2, and ADD-3. Most trials required only 
2 or 3 iterations to reach a = 1 for both the centralized 
and distributed line searches. The variation came from the 



few trials which required significantly more iterations. As 
might be expected, increasing N causes the distributed and 
centralized algorithms to behave closer to each other When 
we increase the size of the network most trials still only 
require 2 to 3 iterations to reach a = 1 but for the cases 
which take more than 2 iterations we jump from around 10 
iterations in the 25 nodes networks to around 40 iterations 
in 100 node networks. 

VI. Conclusion 

We presented an alternative version of the backtracking 
line search using a local version of the Armijo rule which 
allows the stepsize for the dual update in the single com- 
modity network flow problem to be computed using only 
local information. When this distributed backtracking line 
search technique is paired with the ADD method for selecting 
the dual descent direction we recover the key properties of 
the standard centralized backtracking line search: a strict 
decrease in the dual objective and unit stepsize in a region 
around the optimal. We use simulations to demonstrate 
that the distributed backtracking line search is functionally 
equivalent to its centralized counterpart. 

This work focuses on line searches when the ADD-N 
method is used to select the descent direction, however the 
proof method relies primarily on the sparsity structure of 
the inverse hessian approximation. This implies that our line 
search method could be applied with other descent directions 
provided they have are themselves depend only on local 
information. 
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